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THIS STUDY ATTEMFTS TO DEMONSTRATE THAT FATH ANALYSIS IS 
A VALUABLE TOOL FOR INTERPRETING CORRELATIONS IN A CAUSAL 
SENSE. PATH ANALYSIS IS APPLIED TO A NONEXFERIMENTAL , FANEL 
SURVEY IN AN EFFORT TO DETERMINE WHETHER THE MORE SELECTIVE 
OR LESS SELECTIVE COLLEGES HAD A DIFFERENTIAL IMPACT ON THE 
EDUCATIONAL PLANS OF THEIR STUDENTS. THE FROBLEM IS TO 
INTERPRET THREE CORRELATIONS- -SELECTIVITY WITH CHANGES IN 
EDUCATIONAL PLANS, SELECTI ITY WITH COLLEGE GRADES, AND 
COLLEGE GRADES WITH CHANGE* IN EDUCATIONAL FLANS. DATA FOR 
THE STUDY WERE OBTAINED FROM 127,125 ENTERING 1961 FRESHMEN 
IN 248 FOUR-YEAR COLLEGES AND UNIVERSITIES. THE GENERAL 
PROCEDURE WAS TO CONSTRUCT SIX EQUATIONS USING SEVEN 
VARIABLES— FATHER'S EDUCATION, NATIONAL MERIT SCHOLARSHIP 
TEST SCORE, HIGH SCHOOL GRADE AVERAGE, FRESHMAN EDUCATIONAL 
PLANS, SELECTIVITY OF COLLEGE ATTENDED, FRESHMAN YEAR COLLEGE 
GRADES, AND SOPHOMORE EDUCATIONAL PLANS. THE RESULTS SUGGEST 
THAT CHANGES IN EDUCATIONAL PLANS ARE A POSITIVE FUNCTION OF 
THE DEGREE TO WHICH A STUDENT'S ACADEMIC PERFORMANCE DIFFERS 
FROM THAT PREDICTED FROM HIS BACKGROUND AND THE COLLEGE HE 
ATTENDS AND THAT THE DIRECT INFLUENCE OF COLLEGE SELECTIVITY 
ON EDUCATIONAL PLANS APPEARS TO BE SMALL OR NONEXISTENT. IT 
IS CONCLUDED THAT IT IS EXTREMELY DIFFICULT TO PUT THEORIES 
ABOUT COLLEGE ENVIRONMENTS INTO TESTABLE FORM. (HW) 
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Abstract 



In the usual college studies, the investigator frequently has 
to interpret a matrix of intercorrelations between environmental 
variables, student experiences, and changes in student characteristics 
during college. If these correlations are to be used to determine 
how the college environment influences, or is influenced by, the 
students, a number of assumptions must be made about the variables 
under study. This interpretive problem can be handled by path analysis 
a technique which specifies the logical consequences of the assumptions 
To show how path analysis helps to render interpretations explicit, 
consistent, and more susceptible to rejection, a current research 
problem is studied in detail here. 
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The Study of College Environments Using Path Analysis 

Charles E. Werts 

Thistlethwaite and Wheeler* s ( 19 66) well-designed paper on the 
effects of teacher and peer subcultures upon student aspirations de- 
serves a careful examination because it illustrates the state of the 
art in studies of college environments. The current minimum standards 
(leaving much room for improvement; see Stanley, 1966a) for such studies 
include: pre -college measures of relevant, background variables (Astin, 

1961, 1962); measures of the college environment; and measures of change 
(over time) in the student characteristics under study. After control- 
ling for background variables — to deal with student input differences 
between colleges — Thistlethwaite and Wheeler found small but statis- 
tically significant correlations between changes in aspiration for 
graduate schooling and several college environment measures. A number 
of student experiences in college (grades, for example) also correlated 
significantly (after controls) with educational aspirations. Given 
these correlations between college environment variables, student 
experience variables, and changes in aspiration, what can be inferred 
about how the college environment influences student experiences and 
changes in aspiration? This interpretative question will be considered 
here, using for illustration one college environment variable ("selectivity," 

+ The author is indebted to Alexander W. Astin, James A. Davis, Otis D. 
Duncan, Bruce K. Eckland, and Robert C. Nichols for advice and suggestions. 



a measure of the ability level of the students at the college), one 
student experience variable (college grades), and changes in plans for 
advanced education (during the freshman year of college ) c We will show 
how the technique of path analysis can clarify the interpretation by 
making explicit the logical consequences of assumptions about the variables. 
Although our purpose is primarily didactic, a current research problem 
will be studied in detail in order to illustrate the method. 

The problem is to interpret three correlations: "selectivity" 

(abbreviated SELECT) with changes in educational plans (SLOA), 1 
"selectivity" with college grades (CG), and college grades with changes 
in educational plans. "Selectivity" refers specifically to the index 
devised by Astin (1965a) for the colleges in this study, and is the 
proportion of high ability students among entering freshmen at each 
college. For a sample of 105 colleges, Astin found that "selectivity" 
correlated .88 with mean Scholastic Aptitude Test (Verbal plus Math) 
scores of entering freshmen (College Entrance Examination Board, 1961). 

Educational plans" refers to the academic degree the student expects 
to obtain, such as: less than a baccalaureate, baccalaureate, master's, 

or doctoral or professional degree (i.e. M.D., LL.B. , D.D.S., B.D. ). 

The choice of "selectivity," college grades, and changes in educational 
plans for analysis was dictated by a priori, theoretical considerations, 

1. The abbreviation, SLOA, is used to indicate changes in educational 
plans, because freshman educational plans (FLOA) are controlled in every 
case where the meaning of SLOA is discussed. SLOA also refers to the actual 
measure of educational plans at the start of the sophomore year. It should 
be clear from the context whether the interpretation of SLOA (with FLOA con- 
trolled) or the measure itself is referred to. The letters, LOA, are a 
common abbreviation for "level of aspiration," of which educational plans 
are a subtype. 



which will be discussed later. Unfortunately, since previously collected 
data were used, it was not possible to choose the most theoretically 
desirable combination of background factors. Available data included 
father's education (FaEd), high school grade average (HSG), and National 
Merit Scholarship Qualifying Test (NMSQT) (SRA, 1966) scores (obtained 
during the junior year of high school). It was deduced from Thistlethwaite 
and Wheeler's parallel study on a similar population that results would 
have been the same even if mother's educational level, number of freshman 
scholarship applications, family financial resources, and probable major 
field also had been controlled, since these factors would have contrib- 
uted little variance to the prediction of aspiration changes beyond that 
accounted for by initial educational plans and NMSQT scores. Females 
were analyzed separately. Assuming that the factors influencing the 
college plans of high school students are similar to those influencing 
educational plans among college freshmen, Sewell and Armer's (1966) 
finding that neighborhood context adds little variance to prediction of 
college plans independently of sex, ability, and SES suggested negligible 
error in also failing to control for neighborhood context here. However, 
the absence of controls for peer group (Coleman, 1961) an d high school 
context (Boyle, 1966), which have been known to independently influence 
educational aspirations, clearly indicates a defect in the present analysis. 

Before proceeding, it is worthwhile to consider some problems in 
interpreting the correlations obtained in studies of college effects. 

Even though some of the relevant background characteristics are con- 
trolled, it may be necessary to make some assumptions about the temporal 



k 



ordering of the remaining variables. For example, is "selectivity" of 
the college antecedent to both CG and SH)A? If so, this would mean 
that part of the correlation between CG and SLOA is spurious due to 
the common, antecedent factor, "selectivity." Although failure to 
control for relevant, antecedent variables can distort interpretation, 
one can make errors just as serious by overcontrolling, that is, by 
controlling variables which should not be. For example, if CG is an 
intervening variable between "selectivity" and SLOA (as will be argued 
later), controls for CG when interpreting the correlation of "selectivity" 
with SLOA may be misleading, if one assumes that the net association 
after controlling CG represents the influence of "selectivity" on SLOA, 
because one may be controlling for the very mechanism by which "selec- 
tivity" influences SLOA. The disappearance of a correlation under 
these circumstances is not evidence of a lack of relationship between 
"selectivity" and K)A, but instead is evidence of a mediated influence 
or the link (CG) that explains the relationship. In general, subse- 
quent variables (i.e variables causally dependent on the variables 
under study) should not be controlled. If SI/DA is dependent upon both 
CG and "selectivity" in the above case, controls for SLOA when studying 
the correlation of selectivity" with CG would remove valid variance. 
Stanley (1966b) warned about controls for concurrent variables (refer- 
ring specifically to correlated, simultaneously-measured variables that 
are neither cause nor effect of each other), such as equivalent test 
forms. The prescription is fairly clear in these cases: (a) control 

for relevant, antecedent variables, (b) do not control for concurrent 
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or subsequent variables, (c) control for an intervening variable when 
you are attempting to describe the manner in which one variable mediates 
its influence on another, and (d) do not control for any variable unless 
controlling results in a clear-cut, theoretical gain, which usually 
means the elimination of plausible, alternate hypotheses. Methods of 
handling reciprocal variables, or variables that interact, are too 
complex for discussion here. The reader is referred to Wright (i960), 
Blalock (1961, pp. 55-57 )> and Wold and Jureen (l953> PP« 12-13). Careful 
consideration of the relationships between all variables studied is es- 
sential if reasoned — and hopefully reasonable — interpretations of the 
corre] ions are to be made. To quote Eckland (1966): 

. . . without some kind of theoretical framework, without some 
logically derived set of expectations or hypotheses regarding 
the nature of thr interrelationships between the variables being 
observed, it is impossible by any known means to make inferences 
regarding the "importance" of factors. When multiple and partial 
regressions, in general, are used to isolate the effects or 
unique contributions of any factor or set of factors, there must 
be a "causal" model which guides the inclusion or exclusion of 
variables in the regression equations. Without such guidelines, 
spurious correlations cannot logically be separated from "real" 
relationships . 

To facilitate a systematic discussion and analysis, we formulated a 

series of simultaneous equations that have a one-to-one correspondence 

.1* 

with our hypotheses about the nature of the relationships between each 
pair of variables. The rationale for this approach will not be reviewed] 
the reader is referred to the original sources (Boudon, 1965] Wright, 193^ 
Blalock, i960, 1961, 1962, 1964] Duncan, 1966] and Simon, 1954 )• The 
word, "cause," frequently will be substituted for "due to," "influences," 
"produces," or "results in." To quote Blalock (1962): "We shall say 
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that X is a direct cause of Y (written X — Y ) if and only if we can 
produce a change in the mean value of Y by changing X, holding constant 
all other variables which have been explicitly introduced into the 



system and which are not causally dependent on Y (p. 183)." 

Data Collection 

In the fall of 1961, Astin (1965c) collected data on 127,125 students 
who, with few exceptions, included the entire freshman class entering 
each of 248 four-year colleges and universities. The sample of insti- 
tutions was heterogeneous in size, type (e.g., coeducational, public, 
private, nondenoiv* national, denominational), and quality indicators 
(e.g., Fh.D. productivity). Although neither the sample of institutions 
nor that of students was chosen to be representative of any particular 
population, the only significant bias appeared to lie in the exclusion 
of two-year institutions. At the time of registration, each freshman 
filled out a short information form, which included the following ques- 
tions that were used in the present study: 

1. Your high school average (circle one): 



D C C+ B- B Bf- A- 

(1) (2) (3) ( k ) (5) ( 6 ) (r) 

2. Highest degree planned (circle one): 



A A+ 

(8) (9) 



Less than 


BA 


MA 


PhD 


MD 


LLB 


BA or BS 


BS 


MS 


EdD 


DDS 


BD 


( 1 ) 


(2) 


(3) 


00 


00 


00 



Other: 

(5) 



3. Father's education (circle one): 



Grammar 

school 

(1) 



Some high 
school 
( 2 ) 



4 . Circle one : 



Male 



H. S. 

grad. 

(3) 

Female 



Some College Post -grad 

college degree degree 
(*0 (5) (6) 



Follow-up data, which were part of a larger study of the intellectual 
and social environments of undergraduate institutions (Astin, 1965b), 
were obtained by mail survey in the summer of 1962. Approximately equal 
numbers of students at each institution were sent a 12 page questionnaire. 
In large institutions random sampling was used to select those to whom 
questionnaires would be sent, and in coeducational institutions equal 
numbers of males and females were selected. Of the approximately 60,000 
questionnaires mailed, about 55 per cent were returned. The percentage 
of respondents per institution varied from 20 per cent to over SO per 
cent, with higher rates for the more prestigious institutions. After 
questionnaires with large amounts of missing information had been dis- 
carded, a sample of l6,l4l males and 14,417 females remained. The 
particular questions applicable to this study were: 

1. What is the highest level of education you expect to complete? 
(circle one) 



(code) 



Less than bachelor’s degree . . 1 



B.A. or B.S 2 

M.A. or M.S 3 

Ph.D. or Ed.D 4 

M.D. or D.D.S 4 

LL.B 4 

B.D 4 



Other (Circle and specify) - 

2. What is your average grade so far in college? (Circle one) 

(code) 



A 7 

A- or B+ 6 

B 5 

B- or C+ 4 

C 3 

C- or IH- 2 



D or less . . . . 1 
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The files of the National Merit Scholarship Corporation were 
searched, and National Merit Scholarship Qualifying Test (NMSQT ) scores 
were retrieved for two thirds of the follow-up sample. The NM3QT is a 
test of educational development administered in the latter part of the 
junior year in high school as part of a national talent search. This 
analysis utilized the Composite Score, which is obtained by averaging 
the scale scores on the five subtests: English Usage, Mathematics Usage, 

Social Studies Reading, Natural Sciences Reading, and Word Usage. How 
those who do and those who do not take the NMSQT differ is unknown, 
since in some high schools the test is mandatory and in others it is 
administered only to outstanding students. 

Although only changes in educational plans occurring during the 
freshman year of college were studied, this is the period when changes 
are most rapid and perhaps most meaningful. Wallace (1965) found that 
change is a function of previous academic achievement, socioeconomic 
ambition, and peer group attitudes. Astin (1965c, p. 112) indicated 
that the present results were not seriously vitiated by reporting 
error, since over a short time (six weeks) there was a test-retest 
correlation of .98 for father’s educational level, .98 for size of 
high school class, and .91 for high school grade average. 

Because it was desirable to use the NMSQT scores in the analysis, 
it became necessary to ascertain how much the correlations had been 
distorted by either nonresponse to the questionnaire or nonavailability 
of NMSQT scores. It was clear that students for whom NMSQT scores were 
available (N~20,000) had higher high school grades (mean 5*6 for men. 
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6.1 for women) than those responding (N~ 30 , 000 , which included the 
~ 20,000 with NMSQJT scores) to the follow-up questionnaire (mean 5»1 
for men, 5*8 for women), who, in turn, had higher high school grades 
(mean 4.7 for men, 5*5 for women) than the original freshman sample 
(N = 127,125, of whom the ~ 30, 000 questionnaire respondents were a 
subsample). To estimate the effect of these biases, a number of inter- 
correlations were computed (using freshman data) for different groups: 
entering freshmen, all questionnaire respondents (a subsample of 
freshmen ) , and questionnaire respondents who had taken the MSQT. The 
correlation of father’s education (FaEd) with high school grade average 
(HSG) dropped from .09 for the original 76,015 males to .06 for the 
questionnaire respondents to .05 for the NMSQT takers. The correlation 
of HSG with freshman educational plans (FL0A) fell from .33 to .32 to 
.30 for the respective groups, and the correlation of FaEd with FL0A 
fell from .22 to .21 to .20. Although the biases produced slightly 
smaller correlations, the decrease tended to affect all relationships 
equally, so that the pattern of relationships and the deductions there- 
from should not be affected seriously. 

The correlations shown in Table 1 were computed on about 2,000 
males and 2,000 females. They represent one eighth and one seventh 
random samples of the male and female respondents respectively. The 
correlations among FaEd, HSG, and FL0A in this subsample were within 
.01 of the same correlations on the total group of respondents of the 
same sex. As a preliminary measure, all correlations were screened 
to ensure reasonable linearity of regression and generally unimodal 
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Table 1 

Correlation Matrix For a Sample of College Freshmen 





FaEd 


NMSQT 


Males (N = 2,000) 
HSG FLOA SELECT 


CG 


SLOA 




FaEd 


m mm 


.202 


.045 


.205 


.286 


.053 


.201 


FaEd 


NMSQT 


.250 


— 


.514 


.357 


.518 


.386 


.286 


NMSQT 


HSG 


.053 


.468 


— 


.329 


.420 


.480 


.320 


HSG 


FLOA 


.122 


.265 


.230 


— 


.320 


.244 


.622 


FLOA 


SELECT 


.321 


.496 


.233 


.207 


— 


.126 


.228 


SELECT 


CG 


.060 


.365 


.525 


.138 


.033 


— 


.335 


CG 


SLOA 


.105 


.240 


.240 


.541 


.168 


.231 


— 


SLOA 




FaEd 


NMSQT 


HSG 


FLOA 


SELECT 


CG 


SLOA 





Females (N = 2,000) 

Note. — All correlations were computed using a missing data correlation 
program. The correlations for males are above the diagonal* those for females 
below the diagonal. 

distributions, with frequencies decreasing in either direction from the 
mode. All correlations were computed with a missing data correlational 
program, and these correlations were used in all additional computations. 
Construction of the Causal Framework 

The general procedure was to construct six equations using the seven 
variables under study, namely: (a) X^ = Father’s education (FaEd), (b) 

X 2 = National Merit Scholarship Test score (NMSQT), (c) X 3 = High school 
grade average (HSG), (d) = Freshman educational plans (FLOA), (e) X^ = 

"Selectivity" of college attended, (f) X^ = Freshman year college grades (CG), 
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and (g) = Sophomore educational plans (SLOA). The dependent variable 

always is shown on the left side of the equation, and the independent- 
variables on the right. The presence of an independent variable corre- 
sponds to the hypothesis that this variable is a cause of the dependent 
variable, and that variables not included do not affect this relationship. 
The justification for including or excluding a variable is solely logical, 
and hence corresponds to a theoretical premise. In discussing the meaning 
of each equation, the notation, Y**— X (or, equivalently, X— **Y), is used, 
which simply indicates that X is an independent variable which appears on 
the right side of the equation, and that X is hypothesized to be one cause 
of the dependent variable, Y. In the following equations each variable 
will be measured in terms of deviation from its mean— otherwise an addi- 
tional constant would be needed in the equations. 



1. X 2 _ &2l\ 



+ e. 



The hypothesis here is that NMSQT (X^) is caused by FaEd and residual 

(implicit) factors (e 2 ) not explicitly included in the causal scheme. 

The correlation of FaEd with FMSQT undoubtedly summarizes the influence 

2 

of both genetic and sociocultural processes. In the sample of WMSQT takers, 
FaEd correlates more closely with the vocabulary portions than with the 
other subtests of the NMSQT, possibly because vocabulary is more affected 
by home background than is achievement in school subjects like mathematics. 



2. 0. D. Duncan notes that equation 1 is not too plausible, since 

both FaEd (X-,) and EMSQT scores (Xg) probably have substantial genetic 
components. If so, the father’s genetic intelligence lies back of X ± and 
Xg in some complicated way which would render interpretation of a 21 almost 
impossible. Personal communication, March 2k, 1967 • 
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reading, social studies, and the natural sciences. There is little reason, 
however, to believe that vocabulary is more "innate" than mathematical 
ability, despite the fact that verbal ability is the main factor in so- 
called "intelligence" tests. This investigator agrees with Coleman, et al. 
(1966, p. 293) that ability tests are broader and more genei measures 
of knowledge, while achievement tests are narrower measures directed to 
a restricted subject matter. From this viewpoint "intelligence" and 
"achievement" may be only concurrent, overlapping variables and therefore 
impossible to interpret in a causal manner. Because the NMSQT subtests 
clearly are also concurrent variables, a composite average of the subtest 
scores was used in order to avoid confounding the analysis. 

+ e 3 

High school grade average (X^) is caused by FaEd (X^, NMSQT (X 2 ), 
and residual (implicit) factors (e 3 ). Studies by Nichols and Holland 
(1963) and Davidsen (1963) indicated that self-reported grades correspond 
highly (r> .90) to school -reported grades, which justifies their use here. 

Although FaEd may be assumed to cause HSG for the same reasons that 
FaEd — »-NMSQT , it is difficult to assert that NMSQT— ►HSG since both are 
indicators of academic achievement, differing mainly in the method of 
measurement. HSG may be more affected by teachers’ subjective judgments 
and consequently more affected by the student’s relationship with teachers. 
NMSQT might be partialled out of HSG in order to study these factors 
(Lavin, 1965). This procedure would be hazardous with our data, since 
grades from various high schools are noncomparable because of differences 



2. X3 = a 31 X 1 + a 32 X2 
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in grading practices and in the academic ability of the student bodies. 

The main justification for NMSQT — ►HSG is that NMSQT is determined in 
the junior year, while HSG is not finalized until high school graduation. 

As a result, HSG includes academic performance in the senior year, and 
NMSQT— ►HSG merely states that past performance influences future per- 
formance .- 

The correlation of FaEd with HSG (.05 for both sexes) is smaller 
than that of FaEd with NMSQT (.20 for men, .25 for women), partly because 
high FaEd students attend more competitive schools and hence get lower 
grades than would be predicted from their test scores. (The correlation 
of FaEd with HSG when NMSQT is partialled out is small and negative.) 

The small correlations of FaEd with HSG and NMSQT suggest that among 
college students the ability differences between social class groups 
are small compared with the differences within social class groups. Com- 
pared with similar (moderate -sized) correlations among high school students, 
these small correlations apparently result from the fact that nearly all 
high FaEd students go to college, whereas at low FaEd levels only the 
higher ability students enter college. 

3. X^ = a^Xj + + aj^X^ + 

t 

Educational plans upon college entrance (FLOA) are caused by FaEd, 
NMSQT, HSG, and other, residual (implicit) factors (e^). 

The relationship, FaEd — ►FH)A, is now so well-documented that attention 
will be focused on NMSQT— ►FLOA and HSG— ►FLOA. Again we must defend the 
inclusion of these variables in the equation. While no definitive case 
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can be made, the following considerations appear to be relevant! 

(a) Freshman educational plans were measured upon college entrance, 
which normally is three months, and frequently more, after high school 
graduation when HSG is finalized, and long after the WMSQT has been 
administered, (b) The elements of a cross -lag analysis were present 
in the design. Both HSG and FLOA were obtained from the student on 
entry to college, and their analogues, CG and SLOA, were obtained one 
year later. According to the theory (Campbell & Stanley, 1963) of cross- 
lag analysis, an "effect" (e.g., educational aspiration) should correlate 
higher with a prior "cause" (e.g., academic performance as measured by 
HSG) than with a subsequent "cause" (e.g., academic performance as 
measured by CG). If grades influence educational plans, the correlation 
of HSG with SLOA should be greater than the correlation of FLOA with CG. 
The finding that HSG correlates .32 with SLOA for men (.24 for women) 
and that FLOA correlates .24 with CG for men (.14 for women) supports 
this hypothesis. The evidence is not definitive, however, since one 
cannot be certain that HSG and CG are exactly comparable. Rozelle and 
Campbell (1967) have raised additional questions about the validity of 
cross-lag analysis. One value of having NMSQT and HSG as prior variables 
is that 3MMSQ3? scores may heln to adjust HSG for differences between 
schools (in academic ability of student bodies and in grading practices) 
in perhaps the same way that a student may adjust his self -estimate 
according to the "selectivity" of his high school. 

Equations 1, 2, and 3 fundamentally assume a causal ordering of 

<? 

variables: that is, FaEd, NMSQT, HSG, and FLOA, such that each variable 



may be influenced only by variables prior to it and, in turn, may influence 
only subsequent variables. Even if this causal ordering were correct, 
meaningful results also would depend on the nature of the data collected. 

In particular, we want the correlations between the four variables to 
represent the influences of one variable on another without major distor- 
tions from variables outside the causal model. In a sample of college 
students the correlations between these variables are markedly changed 
by college admission requirements. For example, the correlation of FaEd 
with NMSQT or with HSG is much lower among college freshmen than among 
high school students, and therefore radically underestimates the true 
influence of FaEd on NMSQT or on HSG. For this reason, equations 1, 2, 
and 3 will not be used to make causal interpretations. 

k. = a^Xq "** a 52^2 "** a 53^3 "** a 54^i- "** e 5 

The "selectivity" of the college (X^) is determined by the student’s 
grades (HSG), test scores (NMSQT), family background (FaEd), and freshman 
educational plans (FLOA). "Selectivity" may be expected to correlate 
higher with HMSQJT than with HSG because "selectivity" scores were derived 
from NMSQT data. The rationale for UMSQT — ► SELECT and HSG— ►SELECT is 
simply that, using both test scores and high school grades as screening 
devices, more selective colleges admit only the more academically able 
students . 

Because the more selective schools generally are more expensive 
(Astin, 1965c), students from affluent backgrounds will be more likely 
to attend them, thus FaEd— ►SELECT . Furthermore, more highly-educated 
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parents may have a predilection for the more prestigious institutions 
(they may have attended one themselves), and often may he 'willing to 
sacrifice to give their children the "best possible" start in life. 

The hypothesis' that FLOA— ►SELECT (r = .32) is included because 
the more selective colleges possibly may prefer students aspiring to 
the higher professions, and may structure their admissions policies 
and their curricula to attract them. These students also may feel 
that by attending a more selective, prestigious college their chances 
of eventually getting into graduate or professional school are increased. 

5. Xg = a 6l X ± + a 62 X 2 + a^X^ + a^X^ + a^X^ + e^ 

College grades (Xg) are influenced by the competitiveness of the 
college attended ("selectivity"), prior academic ability (HSG, EMSQ,t), 
freshman educational plans (FLOA), and family background (FaEd). 

Again it is hypothesized that future performance depends upon past 
performance (HMSQT — ►CG, HSG— ►CG). Because on entry to college students 
frequently move away, psychologically and/or physically, from their 
families, the relationship, FaEd— ►CG, may be much weaker than FaEd— ►HSG. 
Since some students undoubtedly continue to be influenced by parents, 
however, FaEd— ►CG was not deleted. 

The hypothesis, FLOA— ►CG, suggests that students who aspire to 
graduate or professional school (high FLOA) may study harder because 
they realize that good grades are requisite for graduate work. Finally, 
SELECT— ►CG is based on the supposition that colleges tend to have the 
same distribution of grades (Davis, 1966), a supposition consistent with 
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our finding that "selectivity" correlates only .13 with CG for males 
(.03 for females). Therefore, the student attending a more selective 
college can be expected to get lower grades than he would have at a 
less selective college. In other words, Davis hypothesized that when 
antecedent variables are controlled the association of SELECT with CG 
will be negative. 

6. Xy = **" a j2^2 **" a 73 X 3 "** a y4 X 4 a 75^5 **" a 7 6^6 **" e 7 

Sophomore educational plans (SLOA) are a function of all prior 
variables, including CG, "selectivity" of college, FLOA, HSG, KMSQT, 
and FaEd. 

The core of the present study is the examination of changes in 
educational plans during college as a function of "selectivity" 
(SELECT-^SLOA). The hypotheses, HSG-^SLOA, NMSQT— ►SLOA, and FaEd— ►SLOA, 
correspond to controls for background variables . Since variance due to 
initial level of aspiration is partialled out, FI/DA— ►SLOA in effect means 
that changes in educational plans are being studied. 

The literature contains at least two different hypotheses about the 
nature of the relationship between "selectivity" and changes in educa- 
tional plans. The theory of "relative deprivation" (Davis, 19 66 ) predicts 
that, assuming the student's educational plans generally tend to rise, 
they will" rise less at the highly selective colleges than they will at 
the unselective colleges; whereas the "environmental press" theory 
(Thistlethwaite & Wheeler, 1966 ) predicts just the opposite. One cannot, 
strictly speaking, infer from our correlations that any rise in aspiration 
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takes place. However, because the mean SLOA is greater than the mean 
FIK3A, and because Wallace's (1965 ) results indicated a rapid rise in 
aspiration during the freshman year, we will talk as if this were the 
case. 



Although Davis studied career decisions, his results should apply 
to educational plans also, since career preferences are associated 
closely with decisions to attend graduate school. The coding of edu- 
cational plans here (including M.D., LL.B. , Ri.D., B.D., and D.D.S. in 
the highest category) should sort people into a category much like Davis' 
"high academic -performance career fields," which comprises the physical 
sciences, biological sciences, social sciences, humanities and fine arts, 
and law and medicine. Davis (1956) reasoned as follows: 

The theory of relative deprivation suggests the following 
interpretation of our data: (a) In making career decisions 

regarding the high-performance fields (which generally require 
graduate training), the student's judgment of his own academic 
ability plays an important role, (b) In the absence of any 
objective evidence , students tend to evaluate their academic 
abilities by comparison with other students, (c) Most of the 
other students one knows are those on one's own campus, and 
since GPA's (Grade Point Averages) are reasonably public in- 
formation, they become the accepted yardstick, (d) Compari- 
sons across campuses are relatively rare, and where they take 
place it is difficult to arrive at an unambiguous conclusion 
because institutional differences are not well publicized; 
even when these differences are known, there is no convenient 
scale comparable to GPA for drawing conclusions, (e) Since 
more conclusions are drawn on the basis of GPA standing on 
the local campus than by comparison with students on other 
campuses, GPA is a more important variable in influencing 
self-evaluations and, consequently, career decisions (p. 25 ). 

Davis' study overlaps ours considerably, in both the logic and the 
types of variables used. His School Quality is identical with "selec- 
tivity," his GPA corresponds to CG (since grade point average is a linear 



ERIC 



function of grades), and, as noted before, his coding of Career Aspiration 
sorts people in much the same way as the FI/DA and SLOA educational plans 
codes do here . Davis * theory implies the developmental sequence : 

SELECT— ►CG— ►subjective feeling of academic success— ►SI/DA (where SLOA 
actually means changes in plans with FLOA controlled). This sequence 
suggests that students who attend highly selective colleges will obtain 
lower grades than they would have at less selective colleges (SELECT— ►CG) 
and that their subjective reaction to lower grades (CG— subjective feeling) 
will be one of relative academic failure, causing them to lower their 
aspirations (subjective feeling— ►SLOA). When each variable in a devel- 
opmental sequence is influenced only by the variable immediately preced- 
ing it in the chain (Blalock, 1961 ), controls for any one of the intervening 
variables will reduce the association between the initial and the final 
variable to zero. Consequently, when studying the association of “selec- 
tivity" with SLOA, controls for CG: (a) will remove the influence of 

"relative deprivation" because the mechanism, for its effect (viz. the 
intervening variable, CG) is controlled, and (b) will not remove the in- 
fluence of "environmental press" because the influence, as noted below, 
is not mediated through CG. 

The "environmental press" theory implies that the student will be 
influenced by the demands, expectations, and activities most character- 
istic of teachers and other students at his college. The theory suggests 
a trend towards homogeneity, with deviants tending to become more like 
the majority in their aspirations. Because "intellectual" atmosphere is 
associated closely with "selectivity" (Astin, 196*0, the hypothesis is 






that the more intellectual atmosphere of the more selective college will 
produce a relatively greater ri^e in the educational plans of its students. 
In the absence of compelling evidence from the literature, it is possible 
to suggest many hypotheses other than those presented above: for example, 

perhaps there is a ceiling effect on educational plans, or perhaps in the 
more selective colleges only the inferior students are affected by "relative 
deprivation," the more able students being "pressed" towards the majority 
position. 

Method 

The set of six simultaneous equations constructed from theoretical 
considerations usually is referred to as a recursive system. The 
characteristic of a recursive system is that the regression coefficients, 
a., (where i>j), appear in the set, but the corresponding coefficients, 
a^, do not. A useful property of a recursive system is that least squares 
analysis can be used to obtain unbiased estimates of the regression coef- 
ficients. Therefore an ordinary regression program can be used to calculate 
the regression coefficients by computer. Since this analysis was based 
on standardized regression coefficients, which are easier to interpret 
than unstandardized coefficients, the "normal" equations (Walker & Lev, 

1953 > PP* 324-326) were used to obtain the standardized regression coef- 
ficients directly from the correlation coefficients in Table 1. The 
"normal" equations then were solved by using a standard computer program 
for the solution of simultaneous equations. Some inexactness resulted 
from the use of correlations computed from incomplete data; however, the 
error appeared trivial. The standardized, partial regression coefficients 
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are given in Table 2. The causal model represented by equations 1 through 
6 is a special case of path analysis (Duncan, 1966); its distinctive 
features are that there are no unmeasured variables --other than the re- 
sidual or "implicit" factors— and that each variable is directly related 
to all variables preceding it. Under these circumstances path analysis 
amounts to conventional regression analysis, and the standardized, partial 
regression coefficients are identical with the path coefficients of path 

analysis. Consequently, the term, "path coefficients" (denoted p ), will 

±3 

be used interchangeably with the term, "standardized, partial regression 
coefficients," in the remainder of the discussion. 

Standardized, partial regression coefficients may be compared directly 
with each other: that is, if one is twice the size of another, it is twice 

as important in predicting (estimating) the dependent variable (Steel & 
Torrie, 19^0, p. 28^). In a recursive system where the residual (implicit) 
factors are uncorrelated, Boudon (1965 ) proved a further theorem: the 

standardized, partial regression coefficient is a measure of the direct 
influence of one variable on another. "Direct" in this sense is defined 
as that influence which remains after all other independent variables in 
the causal equation have been controlled. Therefore the direct will be 
equal to the total influence of one variable on another only when there 
are no intervening variables. 

In order to clarify the rationale for regression analysis in terms 
of the relationships postulated previously, equations 1 through 3 will be 

3. This interpretation of "importance in prediction" should not be 
confused with "percentage of variance accounted for." It is more like 

(but not identical with) the sense in which partial correlations are inter- 
preted. 
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Table 2 



Calculated Path Coefficients for all Relationships 









Standardized 


Partial 








Regression Weights 
(path coefficients) 


Independent 


Dependent 








Variable 


Variable 


Symbol 


Males Females 


FaEci 


SLOA 


P-71 


.084 


.018 


NMSQT 


SLOA 


•p71 

P 72 

P 73 

P 74 

£ 75 

p 76 


-.025 


.019 


HSG 


SLOA 


.068 


.037 


FLOA 


SLOA 


.552 


.499 


SELECT 


SLOA 


-.010 


.037 


CG 


SLOA 


.174 


.134 


FaEd 


CG 


p 6 l 

p 62 

P 63 


.028 


.041 


NMSQT 


CG 


.248 


.243 


HSG 


CG 


.413 


.456 


FLOA 


CG 


.081 


.007 


SELECT 


CG 


-.210 


-.208 


FaEd 


SELECT 


P C i 


.189 


.207 


NMSQT 


SELECT 


jg 


.341 


.422 


HSG 


SELECT 




.206 


.009 


FLOA 


SELECT 


p53 

*54 


,092 


.068 


FaEd 


FLOA 


Iki 

P42 

p 43 


.151 


.069 


NMSQT 


FLOA 


.219 


.182 


HSG 


FLOA 


.210 


.l4l 


FaEd 


HSG 


Pot 


-.061 


-.068 


NMSQT 


HSG 


■p31 

p 3 2 


.526 


.485 


FaEd 


NMSQT 


P 21 


.202 


0 

Lf\ 

CV1 

• 



Note. — The abbreviations are: FaEd = Father* s Education, NMSQT = National 

Merit Scholarship Qualifying Test Score, HSG = High School Grade Average, 

CG = Freshman College Grade Average, FLOA = Freshman Educational Plans, SLOA = 
Sophomore Educational Plans, and SELECT = "Selectivity" of College. 
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analyzed in detail. This analysis is presented only for didactic purposes 
for the reasons noted earlier. Figure 1 shows the complete net of causal 
relationships postulated for equations 1, 2, and 3. In least squares 
analysis the error term, corresponding to the residual factors in equations 
1 through 6, is assumed to he uncorrelated with the independent variables 
in the regression equations. As Blalock (i960) pointed out, this is a 
weaker assumption than that used for many partial correlation studies 
where it is assumed that all relevant variables have been controlled. 

Ideally one should bring as many of the outside, disturbing influences 
as possible into the analysis as explicit (i.e. not residual) variables 
in order to minimize interpretational distortions resulting from non- 
independence of the residual factor. 

The causal network in Fig. 1 is an explicit representation of equations 
1, 2, and 3> and amounts to saying that these relationships will account 
completely for the observed correlations between all variables. The cor- 
relation of HSG (X3) with FLOA (Xij.) will be the result of a variety of 
influences, viz.: (a) The direct influence of X 3 on X^ (X^-^X^) will 

create an association between X3 and X^. (b) The antecedent variable, FaEd 

(X^), may contribute some spurious association between X^ and 
(c) The antecedent variable, NMSQT (Xg), likewise may contribute some spurious 
association between and X^ because of its direct influence on both these 
variables (X^**— Xt>— ► X^). (d) Since X^ influences X^ through the inter- 

vening variable, Xg, (X^— ►Xg— ^X^), this mediated influence could combine 
with the direct influence of X ± on X 3 (X^-*^) to create an additional, 
spurious association between X 3 and X^ (X 3 ^-X 1 -^X^-^X i( .). And (e) since 





influences X^ through the intervening variable, X^>, (X^— ►X^— ^X^), 
this mediated influence might combine with the direct influence of X^ 
on X^ (Xj— ►X^) to create additional, spurious association between Xg 
and X^ (X^X^X^-^Xg). 

Because the role of intervening and antecedent factors has been 
commonly discussed in the literature, it may not be too difficult to 
understand sources (a), (b), and (c) (X^**- X 1 “*‘Xg; X^^™Xp^'Xgj and 
Xg— ►X^) of association between Xg and X^ implied in Fig, 1, However, 
the more indirect sources of spuriousness (Xg^- X-^^X^ -►X^ and 
X^-^-Xj— ►Xg —►Xg) are less obvious. Sources of association between any 
pair of variables in equations 1 through 6 can be detailed by using the 
normal equations of regression analysis, in which the standardized, 
partial regression coefficients are stated in terms of the known corre- 
lation coefficients between variables. The normal equations correspond- 
ing to equations 1, 2, and 3 above (p. . are the standardized version of 
the corresponding regression coefficients, a^, in equations 1, 2, and 3) 
are : • * * 



r 12 = 


P21 




r 13 = 


P 3 1 + 


r 12 p 32 


r 23 = r 12 P 31 + 


p 3 2 


r lk = 


PiU + 


r 12 p U2 + r 13 p ^3 


r 2k ~ r 12PlU + 


p 42 + r 23 p ^3 


r 3k = r 13%l. + 


r 23 P ^2 + p ^3 
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First, note that the p^j are unknowns in these equations, hut can he solved 
for since there are as many equations as unknowns. Strictly speaking, the 
normal equations have additional terms, including the residual factors. 
These terms drop out because each residual factor has been specifically 
assumed to he uncorrelated with the independent variables in the least 
squares analysis. To determine the sources of association between X^ and 
X^, the procedure is: 



Step 1. Start with the equation for r^: = r-^p^ + r 23 p^ 2 + p^. 



Step 2. Substitute the equations for r 13 = p 31 + r 12 p 32 and r 23 = r 12 p 31 + p 32 
to obtain: 

r 34 = p iH (p 31 + r 12 p 32 ) + p t2 (r 12 p 3l + p 32 ) + P43 



= P31PIH + r l 2 P 32 Plu. + r 12P 3 lP42 + P 32 P t2 + p t3 
Step 3. Substitute the equation for r^g = pg^ to obtain: 

r 34 = p 31 p 4l + p 21 p 32 p 4l + P 21 P 31 P 42 + p 3 2 p 42 + p 4 3 



In examining the terms on the right-hand side of the equation relating r^ 
to the p_, several analogies can be drawn: (a) the p^ represents the 

direct influence of X 3 on X^ (b) the p^p^ represents the 

spurious association contributed by the antecedent, factor, X^, because 
of its direct influence on X^ (p^) and on X^ (p^) or symbolically, 

^ ^1 * ^3 > ^ 32 Pk- 2 . re P resen "^ s "the spurious association contrib- 

uted by the antecedent factor, Xg, because of its direct influence on X^ 
(p 32 ) and on \ (p^ 2 ) or symbolically, X4— Xg-»X 3 , (d) ths P 21 P 31 P i . 2 
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represents the spurious association due to the direct influence of X^ on 
X3 (P31), combined with the influence of on X^ mediated through Xg> 
i.e. X^'^-X^-^'X^ -►X^, and (e) the P21P32P41 represents the spurious 
association due to the direct influence of X^ on X^ (p^)> combined with 
the mediated influence of X^ on X^ through the intervening variable, X^. 
Each term in the equation for r^ corresponds to another source of 
association, which allows one to specify precisely the sources of associ- 
ation between two variables (the sources implied by the use of regression 
analysis? ) . 

Although the investigator can specify all sources of association 
between two variables, he usually is interested only in certain kinds. 

The sources resulting in spuriousness often are cf little interest, but 
the sources of association due to the direct or mediated influence of one 
variable on another are of considerable interest, since neither is spuri- 
ousness in the interpretive sense. In the analysis of r^, all sources 
of association except X3— ► X^ (measured by Pi^) are considered to be 
sources of spuriousness, simply because there are no variables in the 
model intervening between X^ and X^. To understand mediated sources of 
association, let us solve for r^ in terms of p.^, employing the proce- 



dure used previously to solve for r 



3k‘ 



Step 1. r lk = p 4l + r 12 v kz + r 13 p 43 
Step 2. Substituting r^o = p^^ + i*i2P32 

r l4 = Pin + r l2 p t2 + Pt3 (p 31 + r 12 p 32 ) 



- Pin + r i2 p t2 + p 31 p 43 + r 12 p 32 p t3 
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Step 3« Substituting - P21 

T lk = p 4l + P 21 P 42 + P 31 Pi <-3 + P 21 P 32 P 43 

Interpreting each term of this equation! (a) measures the direct 
influence of X x on X4 (X^^-^X^), (b) Pg^Pl^ is the association due to the 
influence of X-^ on X^, mediated through X^ (i.e. X^-*> Xg -►X^), (c) P31P43 
is the association due to the influence of X^ on Xj^, mediated through X^ 
(i.e. X 1 -^X 3 -^X il .), and (d) P23P32P43 is the association due to the 
influence of X x on X4, mediated through the intervening variables, Xq and 
X^ (i.e. X 1 -^X 2 -^X3-^Xi f ). One can see that the presence of intervening 
variable (s) means that the total influence— direct and mediated— of one 
variable on another is measured not only by the path coefficient between 
them but also must include all the mediated influences. Thus if an inves- 
tigator is interested only in the direct and mediated influences of one 
variable on another, he need only specify all possible mediating influences, 
and he can calculate the association contributed by each one of these by 
multiplying the path coefficients corresponding to each pair of variables 
in the mediating chain: for example, as indicated above, X^L-^Xg-^X^-^X^ 

contributes an association, P2lP32 p 43> ^ e ‘ bween and X^. 

To reiterate, let us examine the correlation of HM3QT (Xg) scores with 
freshman educational plans (F3X)A = X^). In this case HSG (X^ ) intervenes 
between NMSQT and FLOA, and FaEd (Xj_) is a common, antecedent factor. 
Solving the normal equations for rg^, we obtain: 

r 2k = p 4l p 21 + p 42 + p 43 p 21 p 31 + p 32 P 4j 
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Using the path coefficients for males (shown in Table 2), one may deduce 
that r 2 } + (.357) has four components due to: (a) the direct influence of 

on (X2~**X^) or p^ 2 = .219, (b) the mediated influence of X^ on X^ 
through X^ (Xr>— ►X^— Ki^) or P22P43 = »526(.210) = .110, (c) the antecedent 
variable, X-l (X^^-X-l"*^), or PinP 21 = .15l(«202) = .031, and (d) the 
chain, Xij/*— X^**— X-^—^X^, accounts for the remainder, i.e. P43P21P31 = 
.210(. r '02)(-.06l) = -.003. Accordingly, the interpretation of these data 
is that: (a) NMSQT (X2) seems to have a direct influence (independent of 

both X^ and Xg) on freshman educational plans (X4) as measured by the path 
coefficient p^ 2 = .219, (b) NMSQT has an additional, mediated influence 
on FLOA ( NMSQT “►HSG —►FLOA ) which is about one half as potent (.219 vs. 

.110) as the direct influence, and (c) the total influence of WMSQT on 
FH)A, direct and mediated, accounts for .329 (i.e. .219 + .110) of the 
.357 correlation of Xq with X4. The remainder (.357 - .329 = .028) is 
accounted for by the algebraic sum (.031 - .003 = .028) of the two spurious 
or antecedent components. It is worth noting that the inclusion of high 
school grades (HSG = X^) in the causal model allows us not only to measure 
separately the mediated influence, NMSQT —►HSG —►FLOA, as distinct from 
NMSQT— ►FIOA, but also to specify the indirect sources of spuriousness, 
NMSQT**— FaEd —►HSG -►FLOA. 

Results 

This study was originally intended to interpret only three correlations: 
SELECT with CG, CG with SIGA, and SELECT with SLOA. SELECT has been treated 
as antecedent to both CG and SLOA, because SELECT is determined on college 
entrance before CG and SIOA have been determined^ The assumption that CG 
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is antecedent to SIOA is not as easy to defend, because there may be 
little or no time lapse between these variables. In order to interpret 
the correlation? between SELECT, CG, and SI/DA, we must consider the 
association remaining between these variables after FaEd, NMSQT, HSG, 
and FLOA have been controlled. When these latter four variables are 
controlled, no assumptions need be made about the causal ordering among 
them — in other words, equations 1, 2, and 3 are unnecessary. Equation 4, 
in essence, states that before we can study the influence of SELECT on CG 
or on SWA we must control for FaEd, NMSQT, HSG, and FLOA. Since we did 
not originally intend to examine the factors influencing SELECT, equation 
4 need not have been constructed. The causal model corresponding to 
equations 4, 5, and 6 is shown in Fig. 2. Curved lines with arrows at 
either end were drawn between FaEd, KMSQT, HSG, and FLOA to indicate that 
equations 4, and 6 do not assume any causal ordering among these 
variables. Where the path coefficients are less than .10, no arrows were 
drawn. 

The sources of the correlations between SELECT (X^), CG (X^), and 
SLOA (Xj) may be derived from the normal equations in the following manner: 

Step 1. The normal equation corresponding to equation 6 which involves r c „ Is: 

? f 

r 57 = r 15 p 71 + r 25 p 72 + r 35 p 73 + r 45 p 74 + p 75 + r 5b p 76 

Step 2. The normal equation for r ^ is (corresponding to equation 5): 



r 56 = r 15 p 6l + r 2 5 p 62 + r 35 p 6 3 + r 4 5 P 64 + % 



r X 4=.205 
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Fig. 2. Causal model representing equations h, 5, and 6 for males. 
Double -headed, curved arrows represent unanalyzed correlations amo ng the 
four background variables, FaEd, NMSQT, HSG, and FLOA. Straight single 
arrows represent causal influences in the direction indicated. To em- 
phasize the main findings, no arrow was drawn where the path coefficient 
representing the strength of a relationship was less than .10 in absolute 
magnitude . 
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Step 3* Substituting r ^ into r 

r 57 ” *15*71 + *25*72 + *35*73 + ^5^ + *75 + *15*76*61 + *25*76*62 + 

r 35*T6*63 + r 45*r6*64 + *76*65 

r 57 = r 15 (p 71 + Wl 1 + r 25 (p 72 + p 76 p 62 } + r 35 (p 73 + *76*63) + 

*45 (p 7^ + *76*64 } + p 75 + *76*65 

This equation can be rewritten as + spuriousness, 

since we have accounted for the direct effect of on Xj by p^ and the 
indirect effect of X^ on Xj through X^ by PjgPfiij • This equation removes 
the spurious effects of the background variables. Following the princi- 
ples laid down earlier, since there are no intervening variables between 
Xg and X^- the influence of X^ on X^ is measured by Py£, the difference, 
rgf minus Pj£, representing spuriousness. Likewise, the influence of X^ 
on X^ is measured by p^, the difference, r^g minus p^^, representing 
spuriousness. It follows that the .228 (for males) correlation of SELECT 
with SLOA is due to the direct influence '*■** SELECT— ►SLOA measured by 
p - -.010, the mediated influence of SELECT —►CG— ►SLOA measured by 
P65P76 = “•210(.17^) = -.037, and the spurious association caused by 
antecedent variables, X^, X>, X q , and Xj l( , amounting to .275 (since = 

.228 sb -.010 -.037 + spuriousness). The contribution of both the direct 
(-.010) and tho indirect (-.037) influences of SELECT actually is quite 
small, as therefore is the total influence (-.0^7) of SELECT on SLOA. 

The sign of the mediated influence, SELECT— ►CG— ►SKJA, is negative just 
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as Davis (1966) predicted, ’out its size is trivial. The data for women 
also show the indirect influence to he negative and small, i.e. P65P76 = 
-,208(.134) = -.029, whereas the direct influence is small and positive: 
P75 = +.037* The direct influence of "selectivity" (SELECT = X^) on 
college grades (CG = X^) is negative as measured by p^ = -.210 for males 
and -.208 for females. Davis* assertion that a student attending a more 
selective college would have obtained somewhat higher grades at a less 
selective college is supported. The direct, positive influence of college 
grades on sophomore educational plans is measured by p^g = .174 for males 
and .13^ for females. Thus to the extent that a student obtains better 
grades than one would predict (from FaEd, FLOA, HSG, UMSQT, and SELECT), 
his educational plans will rise more than will the plans of students who 
do only as well as predicted. 

It is interesting to note in Fig. 2 that the direct influences on 
SI0A of FaEd (p^ = .08^ for males and .018 for females), of NM3QT 
( p 72 = -•025 for males and .019 for females), and of HSG (p^ = .068 for 
males and .037 for females), are small, which suggests that background 
factors tend to have little direct influence on changes in educational 
plans during college. As in most studies, HSG is the major predictor of 
college grades (p^ = .^13 for males and .4^6 for females), followed by 
test scores like the HMSQT (pgg = *2^8 for males, .2^3 for females), and 
only slightly by FLOA (p^ = .081 for males, .007 for females) or FaEd 
(pgl = .028 for males, .04l for females). 

Equation k might be interpreted as one kind of description of students 
attending the more selective colleges. Thus the path coefficient on SELECT 
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of FaEd (p^ = *189 for males, .207 for females) suggests that these 
students have more highly educated fathers (HSG, NM3QT, and SLOA controlled), 
higher (p^ = »206 for males, .009 for females) high school grades (even 
with NMSQT, FaEd, and SIOA controlled), higher (p^g = .341 for males, .422 
for females) IflMSQT scores (with HSG, SH)A, and FaEd controlled), and only 
slightly higher freshman aspirations (p^ = .092 for males, .068 for females) 
than do other students of comparable background (FaEd, NMSQT, and HSG). It 
is not clear why the path coefficient of HSG on SELECT should be negligible 
for girls but significant for boys. 

Discussion 

The problem here was to interpret the correlations between a college 
characteristic ("selectivity"), a student experience characteristic 
(college grades), and changes in educational plans during the fi-oshman 
year of college. It was necessary as a first step to assume a causal 
ordering among these variables. College "selectivity" was assumed to be 
antecedent to college grades and changes in educational plans, because the 
type of college attended is determined on entry to college, whereas college 
grades and changes in educational plans are ascertained later (i.e. an 
effect should not precede a cause in time). College grade average was 
treated as antecedent to changes in educational plans, because: (a) the 

theory under test— Davis* "relative deprivation"— considers college grades 
to be an intervening variable between college "selectivity" and changes in 
educational plans ; and (b) it was found that high school grade average 
correlates higher with sophomore educational plans than freshman educa- 
tional plans correlates with college grades, which in cross-lag analysis 
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is interpreted to mean that grades probably cause educational plans 
(i.e. an effect should correlate higher with an a priori cause than 
with a subsequent cause ) . With this causal ordering and utilizing the 
usual assumptions of regression analysis, it was possible to interpret 
the correlations between "selectivity," college grades, and educational 
plans. logically one had to consider the characteristics of students 
at different colleges as antecedent factors to be controlled, although 
no assumptions needed to be made about the causal ordering among the 
background variables. Path analysis, which makes explicit the rationale 
behind regression analysis, was used. 

The results suggested that changes in educational plans are a positive 
function of the degree to which a student's academic performance — as mea- 
sured by grades — differs from that predicted from his background and the 
college he attends. If he does better than expected, he will be more 
likely to raise (during the freshman year) his educational sights. The 
direct influence of college "selectivity" on educational plans appears to 
be small or nonexistent. "Selectivity" does seem to influence college 
graces, apparently causing a student at a more selective college to get 
lower grades than he would have received at a le.^s selective college. It 
follows that only to the extent that a student gets lower grades because 
he goes to a more selective college, his sophomore educational plans will 
not be as high as they would have been at a less selective college. The 
results suggested, however, that this indirect influence of "selectivity" 
on educational plans, mediated by college grades, is small and perhaps 
trivial, especially when compared with the more general influence on 
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educational plans of academic performance. Because of all the theoretical 
and statistical assumptions necessary to any interpretation, one may rea- 
sonably doubt the validity of these assertions. As Stanley (1966b) and 
Campbell and Stanley (1963 ) indicated, problems arising from the study of 
existent college populations make any interpretations very speculative. 

On the other hand, it vould be even harder to defend a finding that "selec- 
tivity" positively influences aspiration, because of the plausible, alter- 
nate hypothesis ’.hat such a finding is due to incomplete control for input. 

Whether the present analysis really is an adequate test of either 
the "relative deprivation" or the "environmental press" theory, the pro- 
cesses involved in both are assumed to act upon only a minority of persons — 
the poor achievers at highly selective colleges and those deviating from 
local norms respectively. The use of Pearson product moment correlations 
therefore may be misleading, because the small correlations simply may 
reflect the small number of persons affected. Another problem posed by 
this analysis is our assumption that all the independent variables have 
been measured without error. Had the appropriate reliability and validity 
coefficients been available, they could have been incorporated in the path 
analysis. However, since the data indicate no college effect on education- 
al aspirations, it seems probable that this conclusion would not be modified 
by correcting for unreliability of measurement. 

Summary 

The goal of this study was to demonstrate that path analysis is a 
valuable tool when one wishes to interpret correlations in a causal sense. 

As Duncan (1966) has said: 
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The great merit of the path scheme, then, is that is makes 
the assumptions explicit, and tends to force the discussion 
to he at least internally consistent, so that mutually in- 
compatible assumptions are not introduced surreptitiously 
into different parts of an argument extending over scores 
of pages. With the causal scheme made explicit, moreover, 
it is in a form that enables criticism to be sharply focussed 
and hence potentially relevant not ^nly to the interpretation 
at hand but also, perchance, to the conduct of future inquiry (p, 7). 

To provide a concrete example, the technique was applied to a non- 
experimental, panel survey in an effort to determine if the more selective 
compared with the less selective colleges had a differential impact on the 
educational plans of their students. In order to test this hypothesis, it 
was necessary to make assumptions, including: (a) the usual assumptions 

of regression analysis (linearity, additivity, normality, homoscedasticity 
and error terms uncorrelated with independent variables); (b) ass ump tion 
of a definite causal sequence for the variables studied, such that any 
given variable could be ,, caused n only by variables causally prior to it, 
and could itself be a "cause" only of variables subsequent to it; (c) no 
measurement error for any variable, i.e. perfectly valid indicators; and 
(d) inclusion of all relevant, antecedent factors that would distort in- 
terpretation of any association. The weight of these assumptions was such 
that failure to reject the null hypothesis (that "selectivity" of the 
college does not influence educational plans) is not convincing evidence 
that no such relationship exists, especially in light of the short time 
interval (one year) studied. It seems better to risk making these assump- 
tions and to draw qualified conclusions from the data than to draw no 
conclusions about causality. 
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If anything, this study demonstrates that it is extremely difficult 
to put theories about college environments into testable form because of 
the enormous variety of hidden assumptions in the published studies. One 
value of path analysis is that many of these assumptions are exposed in 
the process of setting up the causal equations. It should be noted that 
the assumptions used in our analysis are not necessary to all path analyses. 
In a given situation, it may be possible to study problems that contain 
correlated error terms , reciprocal causation, or unmeasured variables 
(like the true test score). It is almost impossible to test the assump- 
tions implicit in a given theoretical model, unless one makes from that 
model an a priori prediction which may be contradicted by reality. No such 
prediction was made here. Instead, we tried--post hoc--to account for the 
total pattern of observed correlations. 
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